Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process
نویسندگان
چکیده
We present the design of a corpus of native and non-native speech for the language pair French-German, with a special emphasis on phonetic and prosodic aspects. To our knowledge there is no suitable corpus, in terms of size and coverage, currently available for the target language pair. To select the target L1-L2 interference phenomena we prepare a small preliminary corpus (corpus1), which is analyzed for coverage and cross-checked jointly by French and German experts. Based on this analysis, target phenomena on the phonetic and phonological level are selected on the basis of the expected degree of deviation from the native performance and the frequency of occurrence. 14 speakers performed both L2 (either French or German) and L1 material (either German or French). This allowed us to test, recordings duration, recordings material, the performance of our automatic aligner software. Then, we built corpus2 taking into account what we learned about corpus1. The aims are the same but we adapted speech material to avoid too long recording sessions. 100 speakers will be recorded. The corpus (corpus1 and corpus2) will be prepared as a searchable database, available for the scientific community after completion of
منابع مشابه
The IFCASL Corpus of French and German Non-native and Native Read Speech
The IFCASL corpus is a French-German bilingual phonetic learner corpus designed, recorded and annotated in a project on individualized feedback in computer-assisted spoken language learning. The motivation for setting up this corpus was that there is no phonetically annotated and segmented corpus for this language pair of comparable of size and coverage. In contrast to most learner corpora, the...
متن کاملInter-annotator agreement for a speech corpus pronounced by French and German language learners
This paper presents the results of an investigation of interannotator agreement for the non-native and native French part of the IFCASL corpus. This large bilingual speech corpus for French and German language learners was manually annotated by several annotators. This manual annotation is the starting point which will be used both to improve the automatic segmentation algorithms and derive dia...
متن کاملFrench Learners Audio Corpus of German Speech (FLACGS)
The French Learners Audio Corpus of German Speech (FLACGS) was created to compare German speech production of German native speakers (GG) and French learners of German (FG) across three speech production tasks of increasing production complexity: repetition, reading and picture description. 40 speakers, 20 GG and 20 FG performed each of the three tasks, which in total leads to approximately 7h ...
متن کاملRole of Monolingualism/Bilingualism on Pragmatic Awareness and Production of Apology Speech Act of English as a Second and Third Language
The present study investigated the pragmatic awareness and production of Iranian Turkish and Persian EFL learners in the speech act of apology. Sixty-eight learners of English studying at several universities in Iran were selected based on simple random sampling as the monolingual and bilingual participants. Data were elicited by means of a written discourse self-assessment/completion test (WDS...
متن کاملAddressing Code-Switching in French/Algerian Arabic Speech
This study focuses on code-switching (CS) in French/Algerian Arabic bilingual communities and investigates how speech technologies, such as automatic data partitioning, language identification and automatic speech recognition (ASR) can serve to analyze and classify this type of bilingual speech. A preliminary study carried out using a corpus of Maghrebian broadcast data revealed a relatively hi...
متن کامل